The 16MB BSON document limit fundamentally shapes MongoDB data modeling by forcing designers to consider document growth patterns, array sizes, and the choice between embedding and referencing to ensure documents stay within size constraints.
The 16MB BSON document limit is a hard constraint in MongoDB that significantly influences schema design decisions. Unlike traditional databases where row sizes are typically small, MongoDB's document model encourages embedding related data, but the 16MB limit creates a practical ceiling on how much you can embed. This affects everything from how you model relationships to how you handle arrays, log data, and time-series information. Understanding this limit is crucial because exceeding it causes write operations to fail, and even approaching it can degrade performance due to increased memory pressure and network transfer times.
Array size constraints: Arrays must be sized to prevent unbounded growth. The "Unbounded Array" anti-pattern occurs when arrays can grow indefinitely, eventually hitting the 16MB limit. Solutions include referencing or the bucket pattern.
Embedding vs. referencing trade-offs: The 16MB limit often forces the decision to reference rather than embed. If a user can have thousands of orders, embedding them all would exceed the limit, making referencing the only viable choice.
Document growth patterns: Documents that receive frequent updates with new fields or array elements must be modeled with growth in mind. Fields like lastSeen timestamps don't cause growth, but accumulating historical data does.
GridFS for large files: When you need to store files larger than 16MB, MongoDB provides GridFS, which splits files into chunks and stores them in two collections, but this is specialized for file storage, not general data modeling.
Performance considerations: Large documents (over a few MB) consume more memory and bandwidth. Even before hitting the limit, documents over 1-2 MB can degrade query performance because MongoDB must load them entirely into RAM.
To work within the 16MB limit, several patterns have emerged. The Subset Pattern embeds only the most recent or frequently accessed data (e.g., 10 most recent comments) while storing the full history in another collection. The Bucket Pattern groups time-series data into fixed-size windows (hourly buckets with 60 readings). The Extended Reference pattern stores a limited subset of frequently accessed fields from related documents. These patterns balance the benefits of embedding against the 16MB constraint.
To understand the practical impact, consider a blog post document. If each comment averages 500 bytes, you could store approximately 33,000 comments before hitting 16MB. For a product catalog, if each product variant is 200 bytes, 80,000 variants could fit. These limits might seem generous, but when documents contain multiple arrays or large embedded objects, the limit becomes a real constraint. A user document with 10 years of order history, each order averaging 2KB, could hold only 8,000 orders—insufficient for many e-commerce applications.